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USING  BIWEIGHT  M-ESTIMATES  IN  THE  TWO-SAMPLE  PROBLEM 
PART  1:  SYMMETRIC  POPULATIONS 

Karen  Kafadar 

Statistical  Engineering  Division 
National  Bureau  of  Standards 

Key  Words  and  Phrases:  Student's  t  statistic;  Monte  Carlo  simula¬ 
tion;  robust  confidence  intervals ;  robustness  of  efficiency; 
robustness  of  validity . 

ABSTRACT 

We  propose  replacing  the  usual  Student ' s-t  statistic,  which 
tests  for  equality  of  means  of  two  distributions  and  is  used  to 
construct  a  confidence  interval  for  the  difference,  by  a 
biwelght-"t"  statistic.  The  biweight-”t"  is  a  ratio  of  the 
difference  of  the  biweight  estimates  of  location  from  the  two 
samples  to  an  estimate  of  the  standard  error  of  this  difference. 
Three  forms  of  the  denominator  are  evaluated:  weighted  variance 
estimates  using  both  pooled  and  unpooled  scale  estimates,  and 
unweighted  variance  estimates  using  an  unpooled  scale  estimate. 

Monte  Carlo  simulations  reveal  that  resulting  confidence  intervals 
are  highly  efficient  on  moderate  sample  sizes,  and  that  nominal 
levels  are  nearly  attained,  even  when  considering  extreme 
percentage  points. 

1.  INTRODUCTION 

The  use  of  Student's  t  in  constructing  confidence  intervals 
for  the  difference  in  location  of  two  populations  is  a  common 
practice.  It  is  well  known  that  this  procedure  is  uniformly  most 
powerful  unbiased  when  the  underlying  populations  follow  Gaussian 
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distributions  with  the  same  variance  (Lehmann  1959).  When  the 
distributions  are  in  fact  even  slightly  stretched-talled,  however, 
studies  show  that,  while  the  Student's  t  interval  nearly  maintains 
its  validity  under  the  null  hypothesis  (Yuen  and  Dixon  1973,  Lee 
and  D'Agostino  1976),  the  power  may  be  substantially  reduced  (Yuen 
and  Dixon  1973).  (More  recently,  see  Benjamini  1980  tor  conditions 
under  which  one-sample  Student's  t  is  conservative.)  In  order  to 
achieve  "robustness  of  efficiency"  in  addition  to  "robustness  of 
validity"  (as  defined  in  Tukey  and  McLaughlin  1963),  this  study 
proposes  the  use  of  biweights  in  a  two-sample  "t"-like  statistic, 
which  we  shall  call  biweight-"t".  The  two-sample  problem  raises 
the  issues  of  combining  information  on  scale  of  the  data  and  on 
variance  of  the  numerator  of  biweight-"t".  We  shall  attempt  to 
judge  when  such  borrowing  of  scale  information  may  be  justified. 
This  report  concentrates  on  small  to  moderate  sizes  of  samples  from 
symmetric  populations;  the  unsymmetric  case  is  treated  in  a 
forthcoming  paper.  Section  2  deals  with  the  case  of  equal  sample 
sizes.  Section  3  considers  unequal  sample  sizes,  for  which 
variance  estimates  may  be  weighted  by  their  sample  sizes.  Section 
4  examines  the  performance  of  biweight-"t"  when  the  samples  have 
different  scales.  A  brief  comparison  of  biweight-”t"  intervals 
with  other  familiar  procedures  is  made  in  Section  5,  and  Section  6 
concludes  with  an  example  and  strategies  for  the  two-sample  case. 

2.  EQUAL  SAMPLE  SIZES. 

2.1  Form  of  two-sample  blweight-''t"  and  concepts. 

Let  Xu . xnjf  j  -  F  j(  (x-p  j)/o j)  ,  j  -  1,2,  denote  samples 

from  two  symmetric  populations.  Then  the  two-sample  biweight-"t" 
takes  the  form: 

'■C'bi  -  (Ti-t2)/s 

where  each  Tj  is  a  biweight  estimate  of  location  and  the  squared 
denominator  estimates  the  variance  of  the  numerator: 

S2  -  Var  (Tj-T2). 

For  a  definition  of  the  blweight  and  its  associated  variance,  the 
reader  is  referred  to  Mosteller  and  Tukey  (1977).  For  a  single 
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sample,  . . .  the  only  major  difference  between  their 

calculation  of  the  blweight  estimate  of  location  and  that  used 
here  Is  in  the  choice  of  scale:  6*MAD  (median  absolute  deviation) 
has  been  replaced  by  (6’sbi),  where 

{uk}  =  (uj,  ....  un)  =  {(y)c-median)/9,MAU} 
sbl2  -  n-q(|uk}) 

q({uk()  =  l  T2(uk)/{[Z  r(uk))'max  U,  -1+  l  T(uk)]}  (1) 

k=l  k=l  k“l 

and  the  psi  function  is  given  by 

¥(u)  =  u(l-u2)2  =  u * w( u )  ,  |u|  4  1 

*  0  ,  else. 

One  Chen  solves  for  T  iteratively  via  the  equation 

T<h)  =  i  ykw(uk)/  ?  w(uk),  uk  =  (yk  -  T(h_1 > ) / (6* sbi ) ] .  (2) 
k=l  k=l 

The  iteration  starts  with  the  median  and  ceases  when  the  change  is 
less  than  one  part  in  the  fourth  decimal  place.  An  estimate  of  the 
variance  of  T  may  be  obtained  from  a  finite-sample  approximation  to 
the  theoretical  asymptotic  variance  (cf.  Huber  1981,  p.  45): 

ST2  -  Var(T)  =  q({ukj)  (3) 

where  the  [uk}  are  defined  in  (2).  (The  motivation  for  these 
changes  is  discussed  in  Kafadar  1981,  henceforth  referred  to  as 
[K81 ) . ) 

When  we  have  two  samples,  we  compute  T  and  for  each  sample. 
If  we  denote  these  by  Tj  and  Sj  (j“l,2),  our  two-sample 
biweight-"t"  statistic  then  takes  the  form 

"t”  -  (Tj  -  T2)/(Sj2  +  S22)1/2  .  (4) 

The  variance  estimates  Sj2  will  be  weighted  by  sample  size  in 
Section  3.  In  the  remaining  sections,  we  will  drop  the  subscript 
on  "f'hi,  as  the  form  of  "t"  will  always  involve  the  biweight 
estimates  as  defined  above. 

2.2  Evaluation  criteria. 

Performance  of  biweight-”t"  will  be  evaluated  on  three 
different  distributions: 

°  Gaussian 
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°  One-Wild  (n-1  observations  from  N(0,i). 

1  unidentified  observation  from  N(0,100)) 

°  Slash  (N(0,1)  deviate  /  independent  Uniform[0, 1  ]  deviate). 

These  three  situations  are  likely  to  cover  a  reasonably  broad  range 
of  stretched-tailed  behavior  (Rogers  and  Tukey  1972). 

Robustness  of  efficiency  may  be  evaluated  in  several  ways.  In 
this  study,  the  success  of  biweight-"t"  will  be  measured  primarily 
in  terms  of  "efficiency"  of  the  expected  confidence  interval  length 
(ECIL),  i.e., 

eff(o)  -  [ECILmln(a)/ECILactual(a)]2 

where  ECIL  ,( a)  was  defined  by  Gross  (1976)  as 
actual  ’ 

ECILactua^(a)  “  2(a/2  %-point  of  "t")‘ave(denominator  of  "t”), 

and  ECILraln(a)  is  the  shortest  confidence  interval  we  could  expect 
for  the  given  situation  at  hand.  For  the  Gaussian,  these  are,  of 
course,  Student's  t  intervals,  an  approximation,  derived  in  [K81], 
is  used  for  ECILmin(oi)  in  the  One-Wild  and  Slash  situations. 

Furthermore,  for  practical  ease  of  use,  we  wish  to  approximate 
the  distribution  of  biweight-''t"  by  one  from  a  standard  family  of 
distributions.  The  most  likely  candidate  here  is  Student's  t,  with 
some  chosen  number  of  degrees  of  freedom.  This  chosen  number  may 
be  determined  by  comparing  the  calculated  percent  point  of  "t"  to  a 
Student's  t  table;  i.e.,  the  matching  of  ("t"  critical  point,  a)  to 
(degrees  of  freedom).  The  critical  points  of  the  distribution  were 
all  computed  via  a  Monte  Carlo  swindle,  the  details  of  which  may  be 
found  in  Kafadar  (1979).  The  sets  of  samples  were  those  used  in 
the  Princeton  Robustness  Study  (Andrews  et  al.  1972),  each 
simulated  situation  Involved  either  640  or  1000  samples  of  sizes  5, 
10,  and  20. 

2.3  Asymptotic  Distribution  of  "t". 

That  "t"  of  (4)  has  an  asymptotic  Gaussian  distribution  is 
clear  by  the  following  argument:  for  the  jth  population, 

nl/2(Tj-Mj)  -?*  NI0,  EJf2/(Ejr)2)  , 
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where  the  subscript  of  E  denotes  the  distribution;  e.g. , 

Ely2  _  jV[(x_Tl)/(csl)]dFl(x)  (5) 

for  an  arbitrary  constant  c  (e.g.,  c=6  in  (2)).  Hence, 

nl/2[(T1-T2)-(M1-W2)]  “+  N( 0,  ^ E j4'2/(E jV  )2 ]  , 

Since  (cf.  Carroll  1978) 

n*Sj2  Ejf2/(EjV)2 

we  have  by  Slutsky's  theorem  that 

(T1-T2)-(urM2)]/(Si2+S22)1/2  —  N(0,1).  (6) 

2.4  Borrowing  Scales. 

Since  each  of  the  biweights  in  the  numerator  and  each  of  the 
variance  estimates  in  the  denominator  of  ”t"  requires  an  estimate 
of  s^al.e,  we  may  consider  a  pooled  estimate  if  we  believe  that  both 
samples  have  common  scale.  As  shown  in  [K81),  such  a  pooled 
estimate  in  the  one-sample  "t"  can  substantially  reduce  the 
variability  in  our  results. 

Table  I  gives  the  results  of  two-sample  "t"  when  both  samples 


Table  I 

Biweight-''t"  with  pooled  scales  (Fj  =  F2) 


tail 

Gausslar 

One-Wild 

Slash 

«res 

T^pt 

ar — 

— eTf 

*=pT- 

dr 

eft 

Z-pt 

df 

A) 

=n2=20 

.05 

1.663 

71.1 

97.4 

1.662 

91.0 

95.0 

1.677 

47.8 

69.4 

.025 

2.002 

57.9 

97.3 

1.996 

67.2 

95.0 

2.004 

54.7 

71.2 

.001 

3.279 

43.9 

96.5 

3.290 

43.3 

94.3 

3.228 

62.2 

78.6 

.0001 

4.080 

41.9 

96.9 

4.111 

38.7 

93.3 

3.984 

55.8 

81.6 

.00001 

4.813 

41.4 

97.1 

4.894 

36.8 

91.4 

4.720 

49.5 

82. 1 

B)  n|a 

.05 

■n2=10 

1.692 

33.3 

93.7 

1.693 

32.7 

86. 6 

1.700 

28.4 

70.9 

.025 

2.053 

26.8 

93.5 

2.052 

27.0 

86.9 

2.020 

40.8 

75.5 

.001 

3.537 

20.7 

93.0 

3.571 

19.3 

86.7 

3.277 

46.5 

92.7 

.0001 

4.546 

19.9 

93.4 

5.054 

16.9 

84.0 

4. 341 

33.6 

99.8 

.00001 

5.581 

19.6 

93.8 

5.955 

16.0 

81.5 

4.923 

35.4 

101 . 5 

C)  ni«n2-5 


05 

1.849 

8.4 

91. 

,1 

1.769 

13.2 

60.5 

1.790 

11.4 

68. 

,1 

025 

2.348 

7.3 

86. 

,9 

2.248 

9.4 

59.4 

2.269 

8.8 

77. 

,  7 

001 

7.267 

4.0 

34. 

,5 

6.483 

4.5 

32.3 

7.927 

3.  7 

113. 

.  1 

0001 

16.658 

3.6 

13. 

,5 

16.326 

8.7 

12.1 

28.573 

2.9 

47. 

,0 

00001 

25.061 

3.9 

11. 

,4 

22.755 

4.1 

14.0 

66.471 

2.7 

1. 

8 
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have  the  same  size  and  underlying  distribution.  (Additional 
percent  points  are  available  from  the  author.)  Both  biweights  in 
the  numerator  have  been  scaled  by  s^or,  where 

sbor  **  [(n1+n2)*q( {ui!,  ui2 } ) ] 1 ^ 2  (?) 

uij  =  (xij-TjJ/CO-s^) 

Sj(0)  =  m^d | X| j-T j(0) | >  „  m^d  x^j 

The  subscript  refers  to  a  scale  estimate  which  "borrows"  width 
information  from  more  than  one  sample. 

Table  I  reveals  extremely  high  performance  for  n^  >  10.  In 
particular,  the  resulting  confidence  intervals  for  the  Caussian  are 
trivially  less  efficient  than  if  we  knew  the  true  underlying 
distribution  (93%  or  higher)  and  are  seldom  more  than  20%  wider 
than  the  minimum  ECU.  for  any  situation.  Furthermore,  we  are 
entitled  to  the  full  degrees  of  freedom  in  our  approximation  to  a 
Student's  t  distribution,  across  a  broad  range  of  a-levels. 

To  be  conservative,  we  might  wish  to  approximate  "t"  by  a 
Student's  t  on  nine-tenths  of  the  nominal  degrees  of  freedom  (ndf  = 
ni+n2-2).  For  a  >  .01  and  n^  >  10,  the  actual  ermr  rate  is  only 
slightly  smaller  than  the  nominal  (no  less  than  85%  of  the 
nominal).  As  we  go  further  into  the  tails,  however,  the  actual 
error  tates  may  be  as  low  as  30%  of  the  nominal  (even  lower  for 
Slash,  n«10).  While  the  robustness  of  classical  procedures  for 
extreme  a- levels  has  not  been  investigated,  a  comparison  with  the 
values  in  Lee  and  D'Agostino  (1976)  indicates  that  this  procedure 
is  highly  robust  of  validity  at  a  m  .05,  presumably  this  robustness 
extends  to  the  extreme  a-levels  as  well. 

2. 5  Different  distributions :  separate  scale  estimates. 

All  three  distributions  in  this  study  are  derived  from  the 
Gaussian  with  unit  variance.  This  fact,  however,  does  not  imply 
that  a  pooled  scale  is  appropriate  when  our  samples  come  from 
different  populations,  as  Table  11(A)  reveals.  When  our  two 
samples  do  not  both  have  the  same  underlying  distributional  shape, 
ECIL  efficiency  is  still  high,  but  the  equivalent  degrees  of 
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Table  II 


Biweight-”t *’  with  pooled  and  unpooled  scales  (F|*F2) 

tail  Gaussian,  One-Wild  Gaussian,  Slash  One-Wild,  Slash 

area  *-pt  di 

A)  Pooled  Scales 

eft 

ett 

5T 

err 

L)  ni”n2’20 

.05  1.665 

76.1 

96. 1 

1.734 

17.9 

91.6 

1.725 

20.0 

87.6 

.025  2.001 

58.9 

96.0 

2.090 

19.4 

93.3 

2.072 

22.4 

90.0 

.001  3.303 

41.0 

94.9 

3.523 

21.3 

96.4 

3.456 

24.7 

94.7 

.0001  4.125 

37.5 

93.7 

4.468 

21.7 

96.8 

4.376 

24.3 

95.3 

.00001  4.921 

35.5 

92.3 

5.389 

22.3 

96.7 

5.298 

24.0 

94.5 

2)  nj*n2“10 

.05  1.772 

12.9 

99.0 

1.759 

14.3 

92.0 

1.749 

15.6 

84.4 

.025  2.171 

12.4 

97.1 

2.  125 

15.5 

94.6 

2.  105 

17.5 

87.4 

.001  3.902 

12.3 

89.7 

3.695 

15.8 

98.5 

3.631 

17.4 

92.5 

.0001  5.176 

12.6 

85.2 

4.943 

14.4 

91.7 

4.895 

14.9 

84.8 

.00001  6.557 

12.7 

81.2 

6.343 

13.7 

82.3 

6.361 

13.7 

74.2 

3)  nj-n2”5 

.05  2.213 

3.5 

54.7 

2.163 

3.8 

62.7 

1.975 

5.5 

50.8 

.025  3.137 

3.1 

42.5 

3.005 

3.4 

58.2 

2.641 

4.6 

50.8 

.001  38.305 

1.8 

1.2 

25.437 

2.0 

9.1 

11.178 

2.9 

31.6 

.0001  84.351 

2.0 

0.5 

96.043 

2.0 

1.0 

28.161 

2.9 

5.8 

.00001  130.382  2.3 

B)  Unpooled  Scales 

0.5 

166.625 

2.4 

0.5 

48.053 

3.0 

3.7 

1)  nj=n2“20 

.05  1.672 

56.7 

95.7 

1.703 

27.2 

72.9 

1.694 

32.1 

72.6 

.025  2.010 

49.2 

95.6 

2.041 

30.4 

74.9 

2.028 

36.1 

74.7 

.001  3.316 

38.6 

94.5 

3.368 

31.9 

80.9 

3.327 

37.1 

81.7 

.0001  4.142 

36.1 

93.3 

4.251 

29.3 

81.9 

4.157 

34.9 

84.4 

.00001  4.940 

34.7 

91.9 

5.145 

27.5 

81.3 

4.962 

33.7 

86. 1 

2)  nj=*n2“10 

.05  1.785 

11.8 

97.1 

1.768 

13.3 

73.5 

1.760 

14.2 

71.7 

.025  2.186 

11.7 

95.2 

2.122 

15.8 

76.7 

2.111 

16.9 

74.8 

.001  3.924 

12.1 

88.2 

3.681 

16. 1 

80. 1 

3.b62 

16.6 

78.2 

.0001  5.196 

12.4 

84.1 

4.937 

14.5 

74.2 

4.958 

14.3 

71.0 

.00001  6.574 

12.6 

80.4 

6.349 

13.4 

66.3 

6.439 

13.3 

62.2 

freedom  is  low. 

This 

not  surprising 

,  for 

(7)  is 

designed  to 

estimate  a  common  scale.  A  comparison  of  the  distributions  based 
on  a  different  characteristic  of  width,  such  as  a  pseudo-variance 
quantity,  shows  that  the  Slash  Is  considerably  wider  than  either 
the  Gaussian  or  the  One-Wild  (cf.  Rogers  and  Tukey  1972). 

The  scale  estimate  (7)  borrows  from  both  samples  and  is  used 
in  four  places  in  our  "t"-statistic.  In  general,  of  course,  we 
shall  not  know  when  we  are  entitled  to  borrow.  More  importantly, 
this  pooled  scale  violates  the  independence  assumption  in  the 
numerator.  It  is  true  that  the  asymptotic  distribution  (6)  depends 
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only  upon  the  consistency  (not  the  dependence)  of  the  scale 
estimates  in  the  numerator  (T1-T2).  However,  we  shall  be  applying 
this  result  to  relatively  small  sample  sizes.  While  the  dependence 
between  numerator  and  demoninator  did  not  affect  the  efficiency  of 
a  biweight-"t"  in  the  one-sample  problem  (cf.  [K81]),  it  is  not 
clear  how  the  increased  dependence  in  the  numerator  of  ”t"  will 
alter  its  distribution  on  finite  sample  sizes. 

To  illustrate  the  effect  of  eliminating  this  dependence 
between  the  variables  in  the  numerator,  Panel  B  of  Table  11  shows 
the  results  based  on  unpooled  scales.  Curiously,  despite  the 
incompatibility  of  scales  in  the  Gaussian-Slash  and  Gaussian-One 
Wild  pairs,  biweight-“t"  with  pooled  scales  gives  higher  ECIL 
efficiency  but  slightly  less  degrees  of  freedom  when  nj*n2=20. 
Overall,  we  could  be  fairly  confident  in  a  comparison  of  two-sample 
"t"  to  a  Student's  t  on  0.9(ndf),  ij^  we  knew  when  and  when  not  to 
borrow. 

One  criterion  on  which  to  base  a  decision  applies  a  weight 
function  to  the  logarithms  of  the  scale  estimates.  This  is 
explored  in  Kafadar  (1980);  preliminary  results  on  small  sample 
sizes  are  encouraging.  Although  formal  tests  of  equal  variances 
are  beyond  the  scope  of  this  paper,  one  might  decide  to  borrow  on 
the  basis  of  the  relative  sizes  of  Sbi  for  the  two  samples.  In  the 
absense  of  a  formal  test,  overall  we  conclude  that  "t”  based  on 
pooled  scales  allows  roughly  .9(ndf)  for  all  but  the  extreme 
a-levels,  and  roughly  .8(ndf)  for  the  unpooled  case. 

When  ni=n2“5,  degrees  of  freedom  are  substantially  lower  than 
the  nominal  3,  and  ECIL  efficiency  is  often  below  50%,  even  for  the 
Gaussian  case,  where  the  biweight  typically  performs  well.  An 
explanation  for  this  is  explored  In  [K81]:  the  occasional  presence 
of  one  or  more  observations  which  receive  zero  weight  will  lead  to 
misleading  estimates  of  scale,  thereby  affecting  the  distribution 
of  "t”.  For  small  samples,  the  distribution  of  "t"  can  be 
characterized  much  more  usefully  by  conditioning  on  the  values  of 
the  sum  of  the  biweight  weights.  These  conditional  results  will 
not  be  shown  here  but  are  available  from  the  author. 
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3.  UNEQUAL  SAMPLE  SIZES. 

This  case  is  treated  separately,  because  of  the  dependence  of 
the  variance  estimates  on  sample  size  in  the  denominator. 

3. 1  Asymptotic  Distribution  of  analogous  two-sample  statistic. 

If  we  believe  that  our  biweights  in  the  numerator  have  the 
same  variance,  a  common  assumption  in  the  usual  two-sample 
approach,  we  may  wish  to  pool  our  variance  estimates  in  a 
"borrowed"  (via  mean  squares)  denominator: 

2 

Var(Tj-T2)  “  Sbor2  =  l  ( nl+n2"2)-1-  £n  j(n j-1  )S  j2 ]  (n i_1+n2-1 ) •  (8) 

j  =  l 

A  borrowed-"t"  then  takes  the  form: 

i  bor  **  [ (T i“ T2 )— ( hi- M2 ) J /^bor  •  (9) 

In  computing  Tj  and  Sj  in  (8)  and  (9),  one  may  (or  may  not)  choose 
to  use  a  pooled  scale  estimate  as  in  (7). 

The  denominator  in  (9)  weights  the  estimated  variances  of  the 
statistics  in  the  numerator  according  to  the  sample  size.  Such  an 
approach  would  not  be  reasonable  if  Var(Ti)  *  Var(T2).  Eor  such 
unequal  variance  cases,  we  consider  separate  estimates  of  the 
variance  in  an  unborrowed  denominator  (cf.  Welch's  approach  to  the 
Behrens-Fisher  problem,  Welch  1938): 

"t”unbor=UTl-T2)-(l'l~h2)]/(Si2  +  S22)1/2  ,  (10) 

since  the  variance  of  the  numerator  may  also  be  estimated  by 

S2  ,  =  Si2+  S22  .  (11) 

unbor  1  ‘ 

This  distinction  did  not  of  course  arise  in  Section  2,  for  then  (8) 
and  (10)  reduce  to  the  same  formula. 

That  the  two  forms  of  two-sample  "t"  do  indeed  have  asymptotic 
Gaussian  distributions  under  the  null  hypothesis  can  be  seen  as 
follows.  Following  the  lines  of  the  argument  in  Section  2.3,  we 
know  that 

Ui  -  /S'i(T1-u1)[E1f2/(Eir  )2]-l/2  -9-+  N(0, 1 ) ,  i  -  1,2  (12) 

where  the  notation  for  the  expectations  is  defined  in  (5). 
Furthermore,  if  Ej-F2,  then  the  denominators  in  (12)  are  the  same 
for  both  samples,  so 
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o^nj,  02)  =  [  ti]  ( nj-1  )Sj^+n2(n2_l  )S22  l/(nj+n2-2)  ---♦  ET2/(£t")2. 
Hence,  we  have  that  "t"tj0r  may  be  written 
t  bor  =  UUj/Ziij  -  U2/ /n2)(nj_i+n2_1  )-1^'  l/E'f2/(Et|,)^/o(n],n2)]. 

■  [Uj  {l+(n[/n2)>~lyl2  -  U2 { l+(n2/n j  •  [/EF2/ (Ei"  )2/o( n j ,  n2 ) ] . 

If  nj and  n2*“  In  such  a  way  that  111/02  *  K  <  ”, 

[  l+(n2/nj )  ]~l/2  »  (1+K)-^/2,  [ l+(ni/n2) ]~* ^2  ♦  [K/(l+K)]^2  . 

Hence,  using  Slutsky's  theorem  in  conjunction  with  the  convergence 
in  distribution  in  (12),  we  conclude  that  "t"bor  has  an  asymptotic 
Gaussian  distribution.  If  E[  *  F2,  then  "t"unkor  is  appropriate, 
for  which  the  proof  is  similar. 

3.2  Borrowing  versus  unborrowing:  scales  and  denominators. 

When  we  no  longer  have  equal  sample  sizes,  we  might  be 
Cdutlous  and  prefer  not  to  borrow  estimates  of  either  scale  or 
biwelght  "variance".  We  know  that  such  a  cautious  procedure  may  be 
quite  wasteful  of  valuable  information,  especially  when  one  sample 
has  only  five  observations.  On  the  other  hand,  biweight  variances 
need  not  be  the  same  for  all  distributions,  and  unwarranted 
borrowing  In  such  cases  may  prove  misleading.  In  this  section  we 
investigate  the  effects  of  various  borrowing  possibilities. 

For  the  sake  of  brevity  and  for  ease  of  comparison,  we  shall 
limit  our  attention  to  the  efficiency  of  biweight-"t"  at  a  =  ,001 
as  representative  of  the  behavior  of  "t"  over  the  range  .001)01  <  a 
<  .05.  Table  111  shows  these  results,  where  the  denominator  of  "t" 
is: 

A)  S|,or,  borrowed  scales:  "complete  borrowing"; 

B)  Sfc>or>  unborrowed  scales:  "incomplete  borrowing", 

C)  Sunb0r»  unborrowed  scales:  "complete  unborrowing". 

When  the  distributions  are  the  same,  there  is  nearly  always 
advantage  to  complete  borrowing,  as  seen  most  dramatically  when 
both  underlying  densities  are  Gaussian.  In  these  cases,  we  may 
again  approximate  the  distribution  of  ”t"  by  a  Student's  t  with  the 
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Table  HI 

Matched  decrees  of  freedom  and  EC1L  efficiencies  at  a®.UGi 
tv>r  two- sample  biweight-'* t"  :  Unequal  sample  sizes!  1) 
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O)  Standard  errors  for  critical  points  from  which  degrees  of 

freedom  were  matched  and  KCIL  efficiencies  were  calculated  fell 
in  the  range  0.028  to  0.331  for  a  =  .001. 


(2)  F  j  represents  underlying  distribution  for  sample  i: 
G  =  Gaussian,  W  =  One-Wild,  S  -  Slash. 


nominal  degrees  of  freedom.  When  the  distributions  are  the  same,  a 
conservative  matching  would  be  0.9(ndt).  When  one  distribution  is 
Slash,  incomplete  borrowing  appears  slightly  more  successful. 

Finally,  we  remark  ^nat  there  are  some  cases  for  which  "t”  in 
any  of  the  three  forms  appears  totally  unsuccessful  (e.g.,  n=5 
Slash,  with  anything  else).  This  is  primarily  due  to  the  nature  of 
sinal  L  samples:  there  Is  a  chance  (about  5£  in  the  Gaussian)  that 
one  or  two  bonafide  observations  will  occur  far  enough  away  from 
the  bulk  of  the  data  so  as  to  be  inappropriately  downweighted  by 
any  reasonably  robust  procedure.  When  the  smaller  sample  is 
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restricted  to  be  such  that  the  sum  of  the  biwelght  weights  is  high, 
efficiencies  on  the  blweight-”t"  intervals  are  slightly  higher  than 
those  in  Panel  B.  A  solution  may  well  depend  on  an  appropriate  use 
of  the  weight  distribution  in  these  small  samples. 

4.  UNEQUAL  WIDTHS. 

4 . 1  Unborrowed  denominators ■ 

When  our  samples  have  different  scales,  a  Welch-like 
unborrowed  denominator  of  the  form  (11)  is  a  safe  (but  conserva¬ 
tive)  approach.  To  evaluate  the  performance  of  biweight-"t"  in  the 
presence  of  unequal  widths,  we  multiply  the  observations  of  one  of 
the  distributions  by  either  /2  or  2,  yielding  "variance"  ratios 
between  2  and  4.  A  moderate  difference  in  scales  was  chosen  to 
provide  some  indication  of  the  effect  in  practical  applications. 

In  Table  IV,  we  show  some  trials  of  "t"unbor  either  when 
^1*^2'  nl”n2  or  when  Pl“P2>  nl*n2»  (As  in  Table  III,  only  the 
results  for  a  “  .1)01  are  shown.)  Notice  that  our  previous  matching 
of  the  distribution  to  a  Student’s  t  on  0.8(ndf)  for  unpooled 
scales  would  be  conservative.  This  is  similar  to  the  conservative 
nature  of  Welch's  unborrowed  t-statistic  (e.g.,  as  shown  in  Lee  and 
D'Agostino  1976,  Welch  1938).  Approximating  the  distribution  of 
"t "unbor  by  a  Student's  t  on  0.9(ndf)  instead,  we  see  from  Table  IV 
that  the  actual  levels  are  still  often  less  than  half  the  nominal. 
In  terms  of  robustness  of  efficiency,  however,  ECIL  efficiency 
typically  exceeds  50%. 

As  a  final  comment  on  the  interval  problem  for  samples  of 
varying  widths,  we  mention  the  concept  of  transformation,  a 
familiar  data  analytic  tool  in  such  situations.  When  comparing 
several  batches  of  data,  Tukey  (1977,  chapter  3,4)  draws  attention 
to  the  importance  of  choosing  a  re-expression  of  the  data  for  which 
the  amounts  of  spread  are  roughly  the  same  across  batches.  Such 
re-expression  may  be  useful  in  dealing  with  the  "unequal  variances" 
problem  of  this  section.  For  example,  Anscombe's  (1948)  variance 
stabilizing  transformations  of  Poisson  data  have  been  shown  to 
produce  more  similarity  in  spread.  The  results  of  biweight-"t" 
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Table  . . 

Matclied  degrees  of  freedom  and  ECIL  efficiency 
for  blwelght-"t"  at  1  1  .001:  ”oi  *  °2 
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(1)  I 

C 

represents  underlying  distribution  for 
, ■  Gaussian;  W  =  One-Wild;  S  ”  Slash. 

sample 

j: 

(2)  Indicates 

that 

biwelght-' 

’t”  distribution 

is  shorter-tailed 

than  Gaussian. 


(3)  Actual  a  “  t. 9(ndf  )* ’ 001  ))  >  n°»>inal  a  “  -001. 


discussed  In  Sections  2  and  3  (perhaps  even  the  completely  borrowed 
”t")  may  than  be  applied  successfully  to  such  re-expressed  data. 

5.  COMPARISON  WITH  CLASSICAL  AND  NONPAHAMETRIC  INTERVALS. 

Many  practicing  statisticians  are  reluctant  to  compute  robust 
estimators  or  are  satisfied  with  distribution-free  methods.  Even 
among  users  of  robust  methods,  there  has  been  disagreement 
concerning  the  efficiency  of  the  blweight  over  robust  estimators. 

To  compare  the  performance  of  biweight-"t"  with  Student's  t,  a 
nonparametric  and  a  Huber-type  "t”  interval,  Table  V  presents  the 
results  from  a  separate  Monte  Carlo  study.  For  each  run,  1000 
Gaussian  or  One-Wild  samples  of  size  5,  10,  or  20  were  generated. 
Subroutine  HH  from  Andrews  et  al.  (1972)  computed  the  Huber 
location  estimate,  and  its  standard  error  was  estimated  via  (3)  but 
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where  V(u)  was  replaced  by 

•Rh(  u  )  =  u  |u|  «.  1.5 

=  1.5  |  u  |  >  1.5  . 

Nonparametric  intervals  based  on  the  Wilcoxun  rank  sura  test  are 
described  in  Lehmann  (1975).  Student's  t,  Huber-"t",  and  biweight 
-"t”  all  used  completely  unborrowed  denominators. 

Table  V  reveals  that  Student's  t  is  highly  inefficient  when 
even  one  of  the  samples  is  mildly  contaminated  (One-Wild,  n=20), 
biweight-"t“  intervals  dominate  the  nonparametric  intervals 
(sometimes  by  as  much  as  404)  as  well  as  the  Huber- "t"  intervals. 

A  constant  of  c=4  was  also  run  for  the  biweight,  efficiencies  for 


Table  V 

FOIL  efficiencies  for  five  different 
"t''-conf  idence  intervals 
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moderate  contaminat Ion  (<KU)  are  only  slightly  lower  than  when 
c=b.  The  main  message  ts  that  a  robust  “c"  interval  can  lead  to 
large  gains  in  efficiency  in  long-tailed,  symmetric  situations. 

6.  AN  APPLICATION  AND  CONCLUSIONS. 

b. 1  An  Example  for  borrowed  a nd  unborrowed  "t"  intervals. 

To  gain  some  familiarity  with  the  effect  of  borrowing  scales 
on  biweight  confidence  Intervals,  we  calculate  them  for  a  set  of 
chemical  measurements  taken  at  Che  National  Bureau  of  Standards. 
These  data  consist  of  the  concentrations  of  polychlorinated 
biphenyl  (PCB)  in  a  motor  oil  solution  as  determined  by  gas 
chromatography  (in  units  of  milligrams  per  kilogram  of  oil).  Each 
sample  includes  ten  peak-by-peak  comparisons  of  the  oil  fraction 
chromatogram  with  the  chromatogram  of  a  known  standard  t.ilxture. 

The  box  plots  of  the  data  from  four  ampoules  of  solution  are  shown 
in  Figure  1.  Notice  that  the  underlying  assumption  of  symmetry 


FIG.  1.  Box  Plots  of  Data  from  PCB's  in  oil. 
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does  not  seem  unrealistic  for  these  samples,  and  that  some  outliers 
are  evident  from  ampoule  4. 

While  it  appears  that  all  four  groups  do  not  have  a  common 
scale,  one  might  reasonably  borrow  scales  between  batches  1  and  4. 
If  we  are  interested  in  all  6  pair-wise  comparisons  at  the  93% 
level  of  confidence,  each  interval  should  be  based  on  the  2.5%/b  = 
.4%-polnt  of  the  "t"  distribution  (.9x18=16.2  d.f.  for  pooled 
scales,  .8x18  =  14.4  d.f.  for  unpooled  scales).  The  pooled  scale 
(7)  between  ampoules  1  and  4  is  .988,  from  which  biweights  and 
associated  variance  estimates  may  be  calculated  to  give  a 
confidence  interval  of  the  form 

(Ti  -  T4)  t  t16.2<-004)(Sj2  +  S42)!/2 
=  (99.468  -  106.858)  ±  (3.028)( .0706  +  . 17b)1/2 

=  -7.390  t  1.504  =  (-8.894,  -5.886). 

(The  corresponding  Student's  t  interval,  (-9.220,  -4.026),  is  1.7 
times  wider.)  An  unborrowed  confidence  interval  lor  the  difference 
between  ampoules  1  and  2  is 

(99.468  -  103.357)  t  t14.4(.004)(.0706  +  .587)1/2 
=  -3.889  t  2.493  =  (-6.382,  -1.396). 

(Welch's  (1949)  unborrowed  confidence  interval,  using  the  formula 
for  degrees  of  freedom  on  p.  295,  is  only  trivially  longer.)  Com¬ 
paring  ampoules  2  and  4  gives  a  confidence  interval  of  the  form 
(103.357  -  106.826)  t  t 14.4( .004)( . 587  +  .213)1/2 
=  -3.469  i  2.749  -  (-6.218,  -0.720). 

This  last  comparison  illustrates  the  greater  power  in  this 
procedure  over  the  classical  Student's  t  interval  (-6.203,  0.423), 
which  would  not  reject  the  hypothesis  of  a  difference.  (Had  one 
used  a  Welch  interval,  since  the  equal-variance  hypothesis  rejects 
at  level  .10,  it  would  have  been  even  wider.) 

These  Intervals  do  not  represent  the  final  data  summary 
because  additional  information  on  the  measurement  process  permits 
more  accuracy  in  determining  standard  errors.  For  illustrative 
purposes,  however,  this  information  has  b  n  neglected. 

6.2  Cone ludlng  comments  for  the  two-sample  case. 

This  study  investigated  the  performance  of  a  two-sample  ”t” 
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statistic  when  classical  sample  means  and  variance  are  replaced  by 
their  biweight  counterparts.  Although  computationally  more 
difficult  than  Student's  t,  the  popular  use  of  computers  makes  this 
disadvantage  irrelevant.  The  primary  advantage  is  that  its 
distribution  can  be  well  approximated  by  one  from  the  Student's  t 
family,  from  which  valid,  yet  efficient,  confidence  intervals  for 
the  difference  in  centers  can  be  made. 

Appropriate  scaling  for  biweight-"t"  can  be  important.  We  can 
choose  to  either  pool  estimates  of  scale  (a  wise  move  if  in  fact  we 
have  common  underlying  situations),  or  use  separate  estimates 
(slightly  safer  in  cases  of  doubt).  The  distribution  may  be 
matched  to  Student's  t  on  .9(ndf)  (out  to  .1%-point)  in  the  former 
case  or  .8(ndf)  in  the  latter.  In  either  case,  the  efficiency  of 
the  procedure  (in  terms  of  relative  length  of  the  interval)  is 
upwards  of  70%.  The  same  applies  when  nj*n2,  if  we  weight  the 
variance  estimates  proportional  to  their  sample  sizes  ("borrowed" 
denominator).  Small  samples  sizes  (n=5)  pose  a  problem  only  when 
the  underlying  population  is  extremely  heavy-tailed  (e.g. ,  Slash). 

A  few  trials  of  unborrowed  denominators  were  run  in  situations 
where  the  samples  did  not  have  common  width.  For  the  most  part, 
the  0.8(ndf)  matching  is  quite  conservative;  .9(ndf)  could  be 
safely  recommended  for  all  but  perhaps  the  most  extreme  percent 
points  (.01%  and  beyond).  When  the  underlying  situations  have  the 
same  width,  we  have  better  than  60%  efficiency  out  to  the  .5% 
point.  When  the  situations  are  different  (either  in  distribution 
or  in  width)  ,  the  efficiency  decreases  with  the  increased 
difference  in  the  distribution  (in  terras  of  the  "heaviness"  of  the 
tails). 

While  further  insight  into  the  nature  of  the  weight 
distribution  may  sugggest  refinements,  present  results  indicate 
that  we  may  feel  confident  in  constructing  two-sample  biweight-"t” 
intervals  using  tabulated  Student's  t  percent  points  as  outlined 
above.  A  subsequent  report  will  investigate  the  performance  of 
blweight-"t"  when  the  underlying  populations  are  unsymmetric. 
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